Here the National Institute of Allergy and Infectious Diseases (NIAID) global funding from Year 1990 to 2017 are presented. The data consist only of the funding pertaining to projects on neglected tropical diseases worldwide. In addition to the neglected tropical diseases, we later added Influenza B, malaria, HIV/AIDS, Tuberculosis to illustrate differences between well-known illness and neglected tropical diseases. Through world map visualization, we display the amount of funding received by countries during the period of study. The GDP per capita of the countries funded are also illustrated. Two visualization are shown below. A startic map and an interactive map.
In what follow, we constructed the same map but make it interactive. By clicking on the markers, one can access the information regarding each location.
Both map (i.e. static by country and interactive by city/US state) illustrated a cluster of funding in US and Europe. There were several funding in south America and Asia as well. Asutralia also saw a few funding throughout teh years studied here. In general U.S. has the most awards and the largest amount of funding.
In this subsection, we catetgorized the diseases into different groups. For instance, all leishmeniasis were grouped under leishmeniasis, trypanosoma cruzi was under chagas, and everything that is not defined under any of the category illsutrated below are grouped under other for “other neglected tropical diseases”. Given the fact that Malaria, Influenza B, TB, and HIV?AIDS are not considered neglected tropical diseases according to both CDC and WHO as display on the respective websites CDC and WHO, we removed them from all analysis and reviewed the disease progression.
The plot of various funding shows an increasing pattern for most funding. While all the funding data are noisy, it can be observed that funding toward certain diseases were more constant (i.e. Echinococcus, Hookworm). Dengue and Chagas were some of the fundings that illustrated the increasing pattern the most.
Burden of disease is a concept that was developed in the 1990s by harvard School of Public Health, the Wolrd bank and the World Health Organization (WHO) to describe death and loss of health due to diseases (WHO). Since then, such definition has evolved to other metrics as display below:
Disability-Adjusted Life Years (DALYs): One DALY is one lost year of “healthy life” due to disability, early death or ill-health. DALYs across population is the measurement of gap between the current health status and an ideal health situation where the population lives free of disease (source: WHO).
Years of Life Lost (YLLs): Number of deaths multiplied by the standard life expectancy at the age at which death occurs (source: WHO).
Year Lost due to Disability (YLD): Number of incident cases multiplied by the average duration of the disease and a weight factor that reflects the severity of the disease on a scale from 0 (perfect health) to 1 (dead) (Source: WHO).
Mortality: Number of death due to the disease.
Incidence: Number of new cases of disease that develop in a given period of time (Source: MedicineNet).
Prevalence: Number of cases of the disease that are present in a particular population at a given time (Source: MedicineNet).
These metrics will be used to evaluate disease burden in the analysis. Moreover, recent disease burdens in the Global Disease Burden project are available with two years lag. Therfore, all comparison in this work to the funding will be performed using a similar two year gap (i.e. Funding from “1992 to 2019” vs. disease burden from “1990 to 2017”). The disease burden data was obtained from the Global Disease Burden project GBD.
In what follow, we compared the disease burdens to the Funding for different disease categories. However we first deflated (i.e. inflation adjusted value) all the funding to 1992 values in order to see true association between funding and disease burdens. The inflation rate was calculated using The Biomedical Research and Development Price Index (BRDPI).
Chagas, Leishmaniasis, and Echinococcus saw a decreasing trend in both funding and DALY. Dengue however saw an increasing pattern in both and this illustrate the fact that funding has not reach the critical point that will allow disease to see a decreasing pattern. Schistosomiasis however saw an inverse U shape with the minimum occuring around 2004-2006 which show the effect of funding toward the burden.
Similar to DALY we obserevd several increading patterns (e.g. Dengue). Echinoccoccus however saw a shraper decrease in YLLs. Several burdens were not calculated (Lymphatic Filariasis, Hookworm, Trichurias) due to asbence in data.
As opposed to in DALYs and YLLs, Echinoccoccus saw an increased in YLDs. However Dengue still has an increasing pattern. Lymphatic Filariasis illustrated a an inverse U shaped as we have seen for Echinoccoccus in DALY and the minimum occured around 20004-2006.
Mortality data did not exist for most of these neglected tropical diseases thus we plotted only the funding. Diseases that have death saw similar trend as in DALYs but with different magnitude.
As in previous cases, Lymphatic Filariasis and Hookworm have only zero incidence number so value plotted on the log scale were not display. Leishmaniasis however showa decreasing pattern up to 1998 and start increasing again. A couple years later, the burden start increasing again and close to 2012, it start decreasing again, Such behavior has not be observed using different metric.
Now for the first time, we observed burden for Leishmaniasis increasing throughout 2017. Overall. Dengue has been increasing independently of the burden metric.
Here, we performed a regression analysis by taking NIAID Funding from 1997, 2007, 2017 as depending variables and DALYs from 1995, 2005, and 2015v respectively as independent variable. Recall that to account for inflation, all funding values were deflated to 1992. Two regression models were constructed. One non-constrained model (solid line) and another constrained model that assumed no burden receives no funding (i.e. no intercept; dashed line). The R^2 which is a measure that represents the proportion of the variance for a dependent variable (e.g. Funding) that is explained by the independent Variable (DALYs) was shown only for the non-constrained model. We also show the Slope of the non-constrained model. This slope represents on average, how much unit (in million of Dollars) of duning we can expect for every additional unit (in million) of DALYs.
From the plot above, it is obvious to see during the year 1995, 2005, and 2015 (which corresponds to the DALY years) that HIV/AIDS, Malaria, and Tuberculosis represent outliers in the plot with HIV/AIDS being the largest outliers. We note however that while there was a huge increase of fudning toward HIV/AIDS from 1995 to 2005, the DALYs has alwasy seen a tremendous increased as illsutrated on the x-axis. Thus these graphs should be carefully read as a two-dimensional object. Inflenza B seems not to be such a great outlier in all years measured (i.e. 1995, 2005, and 2015). Another observation that can be seen here is that the slopes are always positive illsutrating the average positive response of funding on burden. Moreover, the slopes show an increasing pattern from 1995 to 2015 which point to the fact that not only the response is positive but it increases throughout the year. We however note that this analysis does not conclude that the funding response to the disease is sufficient. Further work need to be done to answer such question.
As in the analysis with non-neglected tropical diseases, our slope were postives and illustrated and increasing pattern. We note that all diseases below the regression lines are underfunded and those that are above the regression lines are overfunded with respect to the funded disease. With this knowledge, it can be observed that certain diseases were underfunded throughout the years (e.g. Hookworm, Trichuriasis, Lymphatic Filariasis). Certain diseases such Dengue were underfunded in 1995 and became overfunded. We also point out to the magnitude at which these diseases are either underfunded or overfunded. The magnitude can be seen byu the difference of te actual disease data to the regression line. For instance, while both hookworm and Lymphatic Filariasis were underfunded in 2005, Hookworm was much more underfunded and was not in the 99% confidence interval. This point to the need to which diseases to focus future funding on.
In addition to the regression analysis that illustrate the relationship between funding and burden, we plotted the HIV/AIDS data to show the sudden large increase in both funding from 1995 to 2005 was not an articfact from the data butr rather represent actual values. For these reasons, we plotted all burden metric for HIV/AIDS and all data show a similar pattern. We did not however go into detail analysis of this data as the focus of this project is on neglected tropcial diseases.
Spearman correlation was performed between the funding the disease burdens. The correlation table is attched separately (see presentation slide). The code to perfrom these correlation is however provided in the chapter.rmd file.
In this section, we studied the level of disease burden in countries where funded was received from NIAID. Our analysis focused on checking whether the level of disease burden correlate with countries’ poverty level. In this study, we used the per capita GDP (obtained from the world bank database) to evaluate countries’ poverty level. Given the fact that, certain countries will have a higher population and therefore the burden may seems higher, we extracted the population size (again from the world bank database) from 1990 to 2017 and took the average of both the population size and the per capita GDP of 1990 to 2017. The average population size was used to calculate the per capita burden in a country (i.e. DALYs/population).
All the data were scaled for ease of comparison and we plotted both all countries data and certain subset of the data that does not include countries in the northern or southern hemisphere since these are not in the tropical regions and should not have neglect tropical diseases. these plots were however included seprately in the presentation and thus not in this RMD file.
The level of burden in countries change depending on disease burden metric used. However, Brazil Phillipines and Nicaragua saw a very high level of disease burden. Leishmaniasis, Chagas, and Schistosomiasis were the most concentrated disease in Brazil while Dengue, Lymphatic Filariasis, and Trichuriasis were the most concentrated in the other countries. Asd should be expected, countries with the highest GDP per capita did not see any increase a high level of disease burden given the fact that they were not in a tropical area.
To model the trend of the global disease burdens and predict future burdens, we conduct time series modeling utlilizing forecasting tools. Here, we combined the Auto-regressive and the Moving Average model for our analysis. we used the so called ARIMA model (Auto-Regressive Integrated Moving Average). See Forecasting: Principles and Practice for review on the ARIMA analysis. First, a cross validation is performed to access the validity of the model constructed and we then predict future burdens and funding.
To perform the cross validation, data from 1990 to 2010 were used to predict data from 2011 to 2017. The predicted results from 2011 to 2017 were compared to the actual value.
Withe the exception of the Lymphatic Filariasis, cross-validation for all disease burdens seem to follow the actual data pattern. In addition, we note that all predicted value are in the confidence interval.
Given that YLLs data was not available for all diseases, prediction wree conducted for only 6 diseases and the cross-validation was accurate. We however not that Echinococcus have a constant term meaning predicting into the future may lead to a large error.
Cross-validation for YLDs shows similar trend as in DALYs. Howwver the value predicted for Schistosomiasis appear to be constant leading to the fact that the model may not be doing so well on this disease.
The mortality cross-validation also has some missing diseases given the fact that some of the neglected diseases caused a lot of burden but are not generally deadly. we not that diseases such as Leishmaniasis and Chagas appear to show a constant prediction (i.e. p and q from the ARIMA model equal to 0).
Data were not available for the incidence of Hookworm, Lymphatic Filariasis, Trichuriasis, and Schistosomiasis and thus were not included in this analysis. However, the cross-validation accuracy was good for all disease available.
We generally observed a good prediction for the prevalence. However, Lymphatic Filariasis seem to have a very bad prediction and prediction were outside of the 99% confidence interval.
looking at teh funding data compared to the burden data, it is easy to notice that the funding data is much noisier. because of the high level of noise, the model given its univariate essence was not able to do a good prediction. Most prediction were constant. We note that this could be fixed either by providing more time points or using a multivariate statistical approach.
We niote from all prediction that the optimum parameter values for p and q happen to be 0 or 1. This yield a constant prediction. We hypothesis that this could be due to our short data frame. It is recommended to use at least 50 time points when conducting an ARIMA model so having more time point may correct this issue. Moreover, we point out that there could be other variable not taken into account that is affecting our future prediction given that ARIMA is a univariate model. Having a multivariate time serie model where we incorporate at least both funding and burden may provide better accuracy. We then conclude that the prediction into the future may not be constant as observe in this work if one was to use more times series data in addition to a multivariate statistical approach.
In what follow we continue our analsis by focusing on the text variables for specific initiatives. Our analysis focused on the Tropical Medicine Research Centers initiative (RFA AI16-002, RFA AI00-009, RFA AI06-006, and RFA AI11-001) and data was query in isearch using the term “disease burden” resulting in 613 publications. We focused the analysis on “title”, “Mesh extracted”, " Abstracts“, and”Condition". We intend to identify the most comon organisms and disease burden metrics that occurs the most.
From this analysis we note that the most common organisms under studied in this initiative are Leishmaniasis, Chagas, and Schistosomiasis while Prevalence, incidence, and mortality are some of the most common measures used to study these organisms.